31 research outputs found

    Traffic Light Control Using Deep Policy-Gradient and Value-Function Based Reinforcement Learning

    Full text link
    Recent advances in combining deep neural network architectures with reinforcement learning techniques have shown promising potential results in solving complex control problems with high dimensional state and action spaces. Inspired by these successes, in this paper, we build two kinds of reinforcement learning algorithms: deep policy-gradient and value-function based agents which can predict the best possible traffic signal for a traffic intersection. At each time step, these adaptive traffic light control agents receive a snapshot of the current state of a graphical traffic simulator and produce control signals. The policy-gradient based agent maps its observation directly to the control signal, however the value-function based agent first estimates values for all legal control signals. The agent then selects the optimal control action with the highest value. Our methods show promising results in a traffic network simulated in the SUMO traffic simulator, without suffering from instability issues during the training process

    Parallel Reinforcement Learning for Traffic Signal Control

    Get PDF
    AbstractDeveloping Adaptive Traffic Signal Control strategies for efficient urban traffic management is a challenging problem, which is not easily solved. Reinforcement Learning (RL) has been shown to be a promising approach when applied to traffic signal control (TSC) problems. When using RL agents for TSC, difficulties may arise with respect to convergence times and performance. This is especially pronounced on complex intersections with many different phases, due to the increased size of the state action space. Parallel Learning is an emerging technique in RL literature, which allows several learning agents to pool their experiences while learning concurrently on the same problem. Here we present an extension to a leading published work on RL for TSC, which leverages the benefits of Parallel Learning to increase exploration and reduce delay times and queue lengths

    Multi-Agent Credit Assignment in Stochastic Resource Management Games

    Get PDF
    Multi-Agent Systems (MAS) are a form of distributed intelligence, where multiple autonomous agents act in a common environment. Numerous complex, real world systems have been successfully optimised using Multi-Agent Reinforcement Learning (MARL) in conjunction with the MAS framework. In MARL agents learn by maximising a scalar reward signal from the environment, and thus the design of the reward function directly affects the policies learned. In this work, we address the issue of appropriate multi-agent credit assignment in stochastic resource management games. We propose two new Stochastic Games to serve as testbeds for MARL research into resource management problems: the Tragic Commons Domain and the Shepherd Problem Domain. Our empirical work evaluates the performance of two commonly used reward shaping techniques: Potential-Based Reward Shaping and difference rewards. Experimental results demonstrate that systems using appropriate reward shaping techniques for multi-agent credit assignment can achieve near optimal performance in stochastic resource management games, outperforming systems learning using unshaped local or global evaluations. We also present the first empirical investigations into the effect of expressing the same heuristic knowledge in state- or action-based formats, therefore developing insights into the design of multi-agent potential functions that will inform future work

    Distributional Multi-Objective Decision Making

    Full text link
    For effective decision support in scenarios with conflicting objectives, sets of potentially optimal solutions can be presented to the decision maker. We explore both what policies these sets should contain and how such sets can be computed efficiently. With this in mind, we take a distributional approach and introduce a novel dominance criterion relating return distributions of policies directly. Based on this criterion, we present the distributional undominated set and show that it contains optimal policies otherwise ignored by the Pareto front. In addition, we propose the convex distributional undominated set and prove that it comprises all policies that maximise expected utility for multivariate risk-averse decision makers. We propose a novel algorithm to learn the distributional undominated set and further contribute pruning operators to reduce the set to the convex distributional undominated set. Through experiments, we demonstrate the feasibility and effectiveness of these methods, making this a valuable new approach for decision support in real-world problems.Comment: Accepted at IJCAI 202

    Policy Invariance under Reward Transformations for Multi-Objective Reinforcement Learning

    Get PDF
    Reinforcement Learning (RL) is a powerful and well-studied Machine Learning paradigm, where an agent learns to improve its performance in an environment by maximising a reward signal. In multi-objective Reinforcement Learning (MORL) the reward signal is a vector, where each component represents the performance on a different objective. Reward shaping is a well-established family of techniques that have been successfully used to improve the performance and learning speed of RL agents in single-objective problems. The basic premise of reward shaping is to add an additional shaping reward to the reward naturally received from the environment, to incorporate domain knowledge and guide an agent’s exploration. Potential-Based Reward Shaping (PBRS) is a specific form of reward shaping that offers additional guarantees. In this paper, we extend the theoretical guarantees of PBRS to MORL problems. Specifically, we provide theoretical proof that PBRS does not alter the true Pareto front in both single- and multi-agent MORL. We also contribute the first published empirical studies of the effect of PBRS in single- and multi-agent MORL problems

    A practical guide to multi-objective reinforcement learning and planning

    Get PDF
    Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s)

    A Practical Guide to Multi-Objective Reinforcement Learning and Planning

    Get PDF
    Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems

    Observations on the shortest independent loop set algorithm

    Get PDF
    Journal articleThe shortest independent loop set (SILS) algorithm is a widely adopted loop selection method in eigenvalue elasticity analysis to identify dominant loops. However, we find that, in an individual-based model, the SILS cannot identify the maximal independent feedback loops, which further causes the failure of the loop analysis method. This paper exemplifies this issue with an entire family of this model, and proposes research problems for further study.Science Foundation of Irelandpeer-reviewe

    Particle swarm optimisation with gradually increasing directed neighbourhoods

    Get PDF
    Conference paperParticle swarm optimisation (PSO) is an intelligent random search algorithm, and the key to success is to effectively balance between the exploration of the solution space in the early stages and the exploitation of the solution space in the late stages. This paper presents a new dynamic topology called "gradually increasing directed neighbourhoods (GIDN)" that provides an effective way to balance between exploration and exploitation in the entire iteration process. In our model, each particle begins with a small number of connections and there are many small isolated swarms that improve the exploration ability. At each iteration, we gradually add a number of new connections between particles which improves the ability of exploitation gradually. Furthermore, these connections among parti- cles are created randomly and have directions. We formalise this topology using random graph representations. Experiments are conducted on 31 benchmark test functions to validate our proposed topology. The results show that the PSO with GIDN performs much better than a number of the state of the art algorithms on almost all of the 31 functions.non-peer-reviewe

    The influence of random interactions and decision heuristics on norm evolution in social networks

    Get PDF
    Journal articleIn this paper we explore the effect that random social interactions have on the emergence and evolution of social norms in a simulated population of agents. In our model agents observe the behaviour of others and update their norms based on these observations. An agent's norm is influenced by both their own fixed social network plus a second random network that is composed of a subset of the remaining population. Random interactions are based on a weighted selection algorithm that uses an individual's path distance on the network to determine their chance of meeting a stranger. This means that friends-of-friends are more likely to randomly interact with one another than agents with a higher degree of separation. We then contrast the cases where agents make highest utility based rational decisions about which norm to adopt versus using a Markov Decision process that associates a weight with the best choice. Finally we examine the effect that these random interactions have on the evolution of a more complex social norm as it propagates throughout the population. We discover that increasing the frequency and weighting of random interactions results in higher levels of norm convergence and in a quicker time when agents have the choice between two competing alternatives. This can be attributed to more information passing through the population thereby allowing for quicker convergence. When the norm is allowed to evolve we observe both global consensus formation and group splintering depending on the cognitive agent model used.peer-reviewe
    corecore